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1 Introduction 


In Gramacy et al. (2015), the authors are to be congratulated for providing a statistically- 


motivated practical perspective to the important problem of optimizing a black-box func¬ 
tion over a constrained region. They assume that the constraint sets are expensive to 
evaluate and propose a solution to this problem using an Augmented Lagrangian (AL) 


with Gaussian Process (GP) emulation and an Expected Improvement criterion (Jones 


et ah, 1998) for constrained minimization. As pointed out by Gramacy et al. (2015), sta¬ 


tistical optimization is a massively expanding field and their paper addresses a significant 
problem within that field. The majority of this discussion is focused on two particular 
choices in modeling, the first is the interesting idea of removing the maximum from 
the third term in the AL and the second is related to addressing correlated constraint 
functions. Section [2] presents results from some numerical experiments and discusses the 
theoretical implications of dropping the maximum from the AL. Section [3] considers the 
potential impact on the authors’ proposed expected improvement criterion of accounting 
for correlation among emulated constraints. 
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2 The ‘NoMax’ approach 


Gramacy et al. (2015) motivate their methods using the simple “toy problem” of Section 


4.3, specifically min f(x) subject to Ci(x) < 0 and c 2 (a;) < 0, where 


f(x)=x 1+X 2 , Ci(x) = 2 ~Xi -2x 2 - -sm{2n(xl ~2x 2 )) , c 2 (x) = x\ + x\ - - 


We shall further explore this problem by considering two new objective functions. First, 
we consider the linear objective 


Version 1: f(x) = x\ — 

where the solution —1 at the location (0,1) lies on the boundary of the parameter space 
as opposed to the boundary defined by the constraints. In the second version we consider 
a nonlinear objective 


Version 2: f(x) = \(xi — 0.6) 2 + ( x 2 — 0.6) s 


where the solution 0 at the point (0.6, 0.6) lies in the interior of the constraint set. 

For testing purposes we consider three different optimization strategies. Strategy 
Random is a naive approach of taking 100 uniform random samples and selecting the 
smallest function value that satisfies the constraints. Strategy NoMax uses the default 


setting of optim.auglag in the R package laGP (Gramacy, 2015), which removes the 


maximum from the AL. Finally, strategy WithMax uses optim.auglag with the option 


nomax=0, which retains the maximum in the AL. We follow the same setup as Gramacy 
et al. (2015) and consider 100 restarts of the algorithm. The plot in Figure [I] shows 
the results for both Versions 1 and 2 of the toy problem using strategies NoMax and 
WithMax. Table [l] shows the results of minimizing both versions of the toy problem 
using each of the three strategies. 

Figure [l] and Table [l] highlight two points. First, at least in one case, the algorithm 
can be outperformed by a simple random search. It is likely that this is partly reliant on 
the very low dimension of the toy problem, so should not be taken too seriously. Second, 
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Figure 1: Panel 1 (left) shows the results of solving Version 1 of the toy problem and 
Panel 2 (right) shows the results of solving Version 2 of the toy problem. Black line is 
the solution of strategy NoMax and the gray line is the solution of strategy WithMax. 

Table 1: Average value of the function after 100 function calls. 



Random 

NoMax 

WithMax 

fix) = X\ - x 2 

-0.874 

-0.851 

-0.832 

f(x) = l(xi - 0.6) 2 + (x 2 - 0.6) 2 

0.0014 

0.0113 

0.0084 


when the test problem is nonlinear, the NoMax is outperformed by the WithMax. 
This is actually expected, as explained below. 

Recall from classic AL, 


La(x ; A, p) := f(x) + A T c(ir) + — J^(max{0, d(x)}) 2 , 

P i =1 


and the NoMax version (Gramacy et al. 2015), 


L7(x] A, p) ■= f(x) + A T c(x) + — ^(Q(a;)) 2 . 

P i =1 

The reason the AL is effective in practice is captured in the following Theorem. 
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Theorem 2.1 Let f E C 2 and Ci E C 2 for i = 1,2, ...,m. Consider the problem of 


min{/(x) : Ci{x) < 0, i — x E 13}. 


( 1 ) 


Fix any p > 0. Suppose the Linear Independence Constraint Qualification (see (Nocedal 
and Wright, 2006, Def 12. f)) holds at x. Then x is a critical point of Problem 0 if 
and only if there are Lagrange multipliers A > 0 such that (x, A) is a critical point of the 
problem 


min max L Ax; A, p). 

x A>0 


( 2 ) 


Proof: It is immediately clear that Problem 0 is exactly equivalent to 


^ III 

min {/(a;) + — y^(max{0, Cj(x)}) 2 : Ci(x) < 0, i=l,2,...,m, x E B}. (3) 

2p *=i 

Problem 0 can now be viewed as the Lagrangian of Problem 0. The equivalence of 
Problem 0 and Problem 0 now follows from standard Lagrange duality (see (Nocedal 


and Wright, 2006, Thm. 12.1) for example). □ 


This theorem fails for the NoMax version of the AL. In order to illustrate we consider 
the one-dimensional problem 


min{(x — 0.5) : x" — 1 < 0} 


The minimizer of this problem is clearly unique and equal to x — 0.5. The NoMax 
version of the AL for this problem is 

min maxLT(r; A, p) = min max 1 (x — 0.5) 2 + X(x 2 — 1) + — (x 2 — l) 2 
x A>0 x a>o ( 2 p 

Examining max^> 0 {A(x 2 — 1)}, we must have ( x 2 — 1) < 0, or A will be driven to infinity 
and we are clearly not at the minimum with respect to x. As such, we may reduce our 
AL subproblem to the one-dimensional problem 

min | (x — 0.5) 2 + -^—(x 2 — l) 2 : — 1 < x < 1 
x } 2/9 
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In Figure [ 2 ] we provide several plots of (x — 0.5) 2 + j^(x 2 — l) 2 , using various values of 
p. The effect is clear. As p is decreased, the optimal point of the NoMax AL is driven 
away from the true minimum and towards the boundary of the constraint set (note the 
algorithm drives p to 0 whenever stagnation is detected). Moreover, regardless of how 
large p becomes, min x maxA>o L^(x‘, A, p) will never return x = 0.5 in this example. 



Figure 2: Plots of (x — 0.5) 2 + ^(x 2 — l) 2 for p = 10 (left), p — 1 (center) and p — 0.1 
(right). 


Why then does the NoMax version work for Gramacy et al. (2015)? The reason 
lies in the use of the linear objective function. In particular, if the objective function is 
linear (and nonzero), then the minimizer(s) will lie on the boundary of the constraint 
set. Indeed, any point x in the interior of the constraint set will have a nonzero gradient, 
so Fermat’s Theorem confirms it cannot be a minimizer. 

We point out these issues not as a flaw in the paper, but to highlight the importance of 
the assumption that the objective function be linear. On the positive side, the assumption 
is not particularly demanding. After all, if the objective function is nonlinear, one can 
always rewrite the optimization with a linear objective: 


min{/(x) :i6H} = min{r : r > f(x),x E f2}. 


Following along these lines we consider Version 3 of the toy problem, a modification 
of Version 2 taking f(x) = x 3 with constraints C\{x), c 2 (x) above and the additional 
constraint c 3 (x) < 0, where 

c 3 {x) = ^(xi - 0.6) 2 + (x 2 - 0.6) 2 - x 3 . 
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Again we use the same three fitting strategies as above. The plot in Figure [3] shows 
the results using strategies NoMax and WithMax. The average function value using 
Random was found to be 0.046 compared to average function values of 0.063 using 
NoMax and 0.069 using WithMax. 



Figure 3: Black line is the solution of NoMax and the gray line is the solution of 
WithMax for Version 3 of the toy problem. 


What is clear from each of these three simulations is that the constrained optimization 


algorithm proposed in Gramacy et al. (2015) shows promise but there is still work to 
be done in terms of testing and improving the method to deal with more complicated 
situations. It should be noted that, if there are multiple constraint functions, with 
some active and some inactive at the minimizer, the NoMax version may still present 
difficulties, as it may pull points towards inactive constraints. 
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3 Correlated Outputs 


We address the potential impact of explicitly modeling cross correlations among emulated 
outputs. The authors considered only the independent output case, while acknowledging 
the potential for algorithmic efficiency improvements via such considerations. As before, 


our brief investigation was carried out on the toy problem analyzed in Section 4.3 of Gra- 


rnacy et al. (2015). The two constraint functions of this problem evaluated at a random 


sample of 1000 input pairs exhibited a correlation of approximately —0.8, suggesting the 
potential for improved emulation via modeling cross correlation in this application. To 
keep algorithmic computation times at the same order as the independent output case, 
our correlated output model for constraints C\ and C 2 was formulated as a simplified linear 
model of coregionalization (LMC), 


= A r 


where U\ and U 2 are independent GPs and the fixed 2x2 coefficient matrix A n is calculated 
from the n current observations of the two constraints placed in the n x 2 matrix C n 
as follows: Compute the “economy size” singular value decomposition C n = U n T, n V^ 
and set A n = V n Ti n . The two orthogonal columns of U n are transformed constraint 
observations modeled by the independent processes U\ and « 2 - A more general LMC 




such as discussed in Fricker et al. (2013) could be entertained as an improvement to our 


rudimentary approach, which will suffice for the current discussion. 

Figure [4] shows the minimum objective value versus iteration for the scenario con¬ 
sidered in Gramacy et al. (2015), namely an initial design of 10 runs followed by 100 
iterations of Algorithm 1. These results are averaged over five initial Latin hypercube 
designs generated from the Matlab lhsdesign function with different seeds. This small 
study suggests little to be gained from cross correlation modeling. 

It may be the case that the additional complication of modeling cross correlations 
is mitigated with larger initial designs at the expense of expected improvement itera¬ 
tions given a fixed budget. Figure [5] shows the minimum objective value versus iteration 
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Figure 4: Constrained minimum objective function value averaged over five 10-run initial 
designs for independent (black) and correlated (gray) output models. 

averaged over five initial Latin hypercube designs of 20 runs generated as in the previ¬ 
ous study followed by 90 iterations of Algorithm 1. It appears in this case that cross 
correlation modeling could be beneficial for accelerating convergence to the constrained 
minimizer. We emphasize again that the less sophisticated LMC model considered here 
requires roughly the same computational effort to implement as the independent output 
model. 

Clearly no definitive conclusions can be reached regarding the efficacy of cross cor¬ 
relation modeling due to the small size and limited scope of the studies we conducted. 
Nevertheless, such modeling appears to do no harm and is worthy of further study as 
a potentially useful alternative in expected improvement algorithms involving emulation 
of multiple correlated outputs. 
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Figure 5: Constrained minimum objective function value averaged over five 20-run initial 
designs for independent (black) and correlated (gray) output models. 

action/equal opportunity employer, is operated by Los Alamos National Security, LLC, 
for the National Nuclear Security Administration of the U.S. Department of Energy under 
contract DE-AC52-06NA25396. 


References 

Flicker, T. E., Oakley, J. E., and LIrban, N. M. (2013), “Multivariate Gaussian Process 
Emulators With Nonseparablc Covariance Structures,” Technometrics , 55, 47-56. 

Gramacy, R. B. (2015), laGP: Local Approximate Gaussian Process Regression, R pack¬ 
age version 1.1-3. 

Gramacy, R. B., Gray, G. A., Digabel, S. L., Lee, H. K. H., Ranjan, P., Wells, G., 
and Wild, S. M. (2015), “Modeling an Augmented Lagrangian for Improved Blackbox 
Constrained Optimization,” Technometrics , 61, 1-38. 

Jones, D. R., Schonlau, M., and Welch, W. J. (1998), “Efficient Global Optimization of 
Expensive Black-Box Functions,” Journal of Global Optimization, 13, 455-492. 

Nocedal, J. and Wright, S. J. (2006), Numerical Optimization, Second Edition, New York: 
Springer Series in Operations Research and Financial Engineering. 


9 




